Empirical Methods

Q1 - 25th percentile Q2 - 50th percentile Q3 - 75th percentile

Non-parametric tests make fewer assumptions than parametric tests. Such as:

  • Wilcoxon-Mann-Whitney U Test
  • Kruskal-Wallis Test
  • Robust Rank Order Test Adjacent values are the maximum and minimum values that are considered to be part of a dataset, values outside of these are outliers. They’re considered to be double the IQR.

Cards

Q: What do the quartiles correspond to? A: Q1 - 25th, Q2 - 50th, Q3 - 75th

Q: What is a Limit Order Book (LOB) based exchange? A: Where buyers submit “bids” and sellers submit “asks”, and when the values overlap, a trade is made.

Q: What is the difference between a parametric and non-parametric test? A: A parametric test makes assumptions about the underlying distribution, non-parametric make far less assumptions.

Q: What is a confidence interval? A: Two values which we say the mean has some probability 1α1-\alpha of being between. α\alpha is also known as the significance level, 0.10.1 is common.

Q: What are some ways of getting a confidence interval? A: - Use percentiles as the bounds. E.g. 5th and 95th for a 90% CI.

  • The Central Limit Theorem for large (n>30n\gt 30), we can treat the sample means as coming from a normal distribution with the correct mean.

Q: How do we compare two datasets with confidence intervals? A: - If the CIs don’t overlap, we can say there is a definite difference.

  • If they overlap, and one or more mean is in the CI of the other, there is no difference.
  • If they overlap, but neither mean is in the other CI, do a better test.

Q: What are the usual assumptions for parametric tests? A: - Independent observations (unless paired data)

  • Observations are random draws from a normal/Gaussian distribution.
  • Measured variable is measured on at least an interval scale.
  • n30n\ge 30 usually advised
  • Populations have equal variance
  • Hypotheses are usually about numerical values e.g mean

Q: What are the usual assumptions for non-parametric tests? A: - Independent observations (unless paired data)

  • Few assumptions regarding distribution
  • Scale of measurement may be nominal or ordinal
  • Primary focus is on rank-ordering or frequency of data
  • Hypotheses usually about ranks, medians, or frequencies of data
  • Sample size requirements usually less strict

Q: What is the difference between nominal and ordinal? A: Ordinal data can be classified and ranked, nominal can just be categorised.

Q: What is the Wilcoxon-Mann-Whitney U Test? A: - Test for checking if two independent samples on a continuous dependent variable are from the same population

  • Non-parametric version of the independent t-test
  • Compares measures of central tendency
  • Uses median instead of mean, no assumption of normal distribution
  • Involves sorting and combining the two sample lists, and ranking each sample from low to high, summing the ranks for each sample

Q: What are the assumptions of the Wilcoxon-Mann-Whitney U Test? A: - Independent variable has two groups

  • Dependent variable’s measurement scale is at least ordinal
  • Data are randomly selected samples from two independent groups
  • Population distributions of the two groups have similar shape, but different central tendencies
  • Avoid multiple pairwise tests, accuracy falls off

Q: What is the difference between the Wilcoxon-Mann-Whitney U Test and Kruskal-Wallis? A: WMWU tests for pairwise equality, KW tests for multiple groups.

Q: What are the adjacent values on a box-and-whisker plot? A: Indicate the extents of the tails of a distribution. Upper adjacent value is the largest value in the dataset that is less than twice the IQR greater than the median.